20 research outputs found

    Distributional composition using higher-order dependency vectors

    Get PDF
    This paper concerns how to apply compositional methods to vectors based on grammatical dependency relation vectors. We demonstrate the potential of a novel approach which uses higher-order grammatical dependency relations as features. We apply the approach to adjective-noun compounds with promising results in the prediction of the vectors for (held-out) observed phrases

    Learning to distinguish hypernyms and co-hyponyms

    Get PDF
    This work is concerned with distinguishing different semantic relations which exist between distributionally similar words. We compare a novel approach based on training a linear Support Vector Machine on pairs of feature vectors with state-of-the-art methods based on distributional similarity. We show that the new supervised approach does better even when there is minimal information about the target words in the training data, giving a 15% reduction in error rate over unsupervised approaches

    Anti-social media

    Get PDF
    To inform the discussion over free speech and hate speech, this study examines the way racial, religious and ethnic slurs are employed on Twitter. Executive summary: How to define the limits of free speech is a central debate in most modern democracies. This is particularly difficult in relation to hateful, abusive and racist speech. The pattern of hate speech is complex. But there is increasing focus on the volume and nature of hateful or racist speech taking place online; and new modes of communication mean it is easier than ever to find and capture this type of language. How and whether to respond to certain types of language use without curbing freedom of expression in this online space is a significant question for policy makers, civil society groups, law enforcement agencies and others. This short study aims to inform these difficult decisions by examining specifically the way racial and ethnic slurs (henceforth, ‘slurs’) are used on the popular microblogging site, Twitter. Slurs relate specifically to a set of words, terms, or nicknames which are used to refer to groups in a society in a derogatory, pejorative or insulting manner. Slurs can be used in a hateful way, but that is not always the case. Therefore, this research is not about hate speech per se, but about epistemology and linguistics: word use and meaning. In this study, we aim to answer two following questions: (a) In what ways are slurs being used on Twitter, and in what volume? (b) What is the potential for automated machine learning techniques to accurately identify and classify slurs

    Aligning packed dependency trees: a theory of composition for distributional semantics

    Get PDF
    We present a new framework for compositional distributional semantics in which the distributional contexts of lexemes are expressed in terms of anchored packed dependency trees. We show that these structures have the potential to capture the full sentential contexts of a lexeme and provide a uniform basis for the composition of distributional knowledge in a way that captures both mutual disambiguation and generalization

    Improving Semantic Composition with Offset Inference

    Get PDF
    Count-based distributional semantic models suffer from sparsity due to unobserved but plausible co-occurrences in any text collection. This problem is amplified for models like Anchored Packed Trees (APTs), that take the grammatical type of a co-occurrence into account. We therefore introduce a novel form of distributional inference that exploits the rich type structure in APTs and infers missing data by the same mechanism that is used for semantic composition.Comment: to appear at ACL 2017 (short papers

    Improving sparse word representations with distributional inference for semantic composition

    Get PDF
    Distributional models are derived from co- occurrences in a corpus, where only a small proportion of all possible plausible co-occurrences will be observed. This results in a very sparse vector space, requiring a mechanism for inferring missing knowledge. Most methods face this challenge in ways that render the resulting word representations uninterpretable, with the consequence that semantic composition becomes hard to model. In this paper we explore an alternative which involves explicitly inferring unobserved co-occurrences using the distributional neighbourhood. We show that distributional inference improves sparse word repre- sentations on several word similarity benchmarks and demonstrate that our model is competitive with the state-of-the-art for adjective- noun, noun-noun and verb-object compositions while being fully interpretable

    A critique of word similarity as a method for evaluating distributional semantic models

    Get PDF
    This paper aims to re-think the role of the word similarity task in distributional semantics research. We argue while it is a valuable tool, it should be used with care because it provides only an approximate measure of the quality of a distributional model. Word similarity evaluations assume there exists a single notion of similarity that is independent of a particular application. Further, the small size and low inter-annotator agreement of existing data sets makes it challenging to find significant differences between models

    Disrupting Daesh: measuring takedown of online terrorist material and its impacts

    Get PDF
    This report seeks to contribute to public and policy debates on the value of social media disruption activity with respect to terrorist material. We look in particular at aggressive account and content takedown, with the aim of accurately measuring this activity and its impacts. Our findings challenge the notion that Twitter remains a conducive space for Islamic State (IS) accounts and communities to flourish, although IS continues to distribute propaganda through this channel. However, not all jihadists on Twitter are subject to the same high levels of disruption as IS, and we show that there is differential disruption taking place. IS’s and other jihadists’ online activity was never solely restricted to Twitter. Twitter is just one node in a wider jihadist social media ecology. We describe and discuss this, and supply some preliminary analysis of disruption trends in this area
    corecore